Distilling Knowledge from an Ensemble of Models for Punctuation Prediction
نویسندگان
چکیده
This paper proposes an approach to distill knowledge from an ensemble of models to a single deep neural network (DNN) student model for punctuation prediction. This approach makes the DNN student model mimic the behavior of the ensemble. The ensemble consists of three single models. Kullback-Leibler (KL) divergence is used to minimize the difference between the output distribution of the DNN student model and the behavior of the ensemble. Experimental results on English IWSLT2011 dataset show that the ensemble outperforms the previous state-of-the-art model by up to 4.0% absolute in overall F1-score. The DNN student model also achieves up to 13.4% absolute overall F1-score improvement over the conventionally-trained baseline models.
منابع مشابه
Distilling the Knowledge in a Neural Network
A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions [3]. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large n...
متن کاملDevelopment of an Ensemble Multi-stage Machine for Prediction of Breast Cancer Survivability
Prediction of cancer survivability using machine learning techniques has become a popular approach in recent years. In this regard, an important issue is that preparation of some features may need conducting difficult and costly experiments while these features have less significant impacts on the final decision and can be ignored from the feature set. Therefore, developing a machine for p...
متن کاملzoning of flood hazard in Nowshahr city using machine learning models
The aim of this study is to predict and model flood hazard in the city of Nowshahr, Mazandaran province using machine learning models. The criteria and indicators affecting flood hazard were identified based on the review of resources, and then the indicators were converted into rasters in ArcGIS environment, and finally standardized by fuzzy method for use in the models. K-nearest neighbor ...
متن کاملMachine learning algorithms in air quality modeling
Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...
متن کاملPredicting distribution of Eurasian Lynx (Lynx lynx) using an ensemble modeling approach: A Case Study: Saveh Zarandieh Kharaghan Area, Markazi Province
Adequate knowledge about suitable habitats for wildlife is essential to prevent habitat destruction and extinction of species and for their conservation and management. The Eurasian lynx is one of the mostly distributed cats in Asia. In this study, we applied an ensemble habitat suitability modeling approach, using ten predictor variables to model Eurasian Lynx’s habitat suitability in Saveh Za...
متن کامل